Anthropic's Fable and the State of AI

Anthropic’s Fable and the State of AI

On June 9th, Anthropic released its Fable generative AI model. Three days later, the US government classified it as a dangerous munition, and used its export-control authority to prohibit any foreign nationals from accessing it. Unable to differentiate between Americans and foreigners, the company shut off access for everyone.

The government’s actions won’t help. The problem isn’t any one particular model; it’s the general trend of increasing AI capabilities. And any real solution requires the sort of collective action that just isn’t possible right now.

Fable is the constrained version of Mythos, the AI model Anthropic announced in April. Anthropic only released it to a few selected organizations, because the company claimed it was so good at finding and exploiting vulnerabilities in computer code that releasing it more generally would be dangerous.

It was an obviously self-serving announcement, and because few were able to verify Anthropic’s claims they were met with some skepticism. Those with access used Mythos to find and patch many vulnerabilities in their own software. But one UK group found the latest, already public, OpenAI model to be just as powerful.

Fable is just another incremental improvement in the years-long climb of AI capabilities. But just as important as the AI model is the “harness.” This is typically not AI. It’s ordinary computer code that interfaces with the user. It stitches together AI models, decides how and for what purposes they can be used, and gives them useful tools such as web search and the ability to run their own computer code.

When Mythos first entered limited release, there was widespread debate whether its power came from the model or the harness. With Mythos demonstrating that it was possible, the open-source community scrambled to build harnesses that could steer other AI models towards similar capabilities. Harness improvements don’t need massive data or data centers.

They largely succeeded. For example, a Prague company was able to replicate Anthropic’s few verifiable cybersecurity capabilities with a much smaller and cheaper model—and a more sophisticated harness. Last week, a group showed that multiple cheaper models harnessed in concert matches Fable’s performance.

The broader community had only a few days with Fable, but that time we learned some about its capabilities. Its difference is less the new model’s raw analytical and problem solving capabilities, and more that the model doesn’t need that sophisticated harness.

Fable requires much less expertise and detailed prompting from the human user. You can give it a difficult goal and it will figure out novel and unexpected ways to satisfy it, finding loopholes in whatever constraints you or the system have imposed on it.

“Relentlessly proactive” is how AI researcher Simon Willison described it. Another descriptor might be “creative.” Experienced AI developers have had that combination of creativity and proactivity since last year, but Fable puts it within easy reach of everyone.

In the hands of someone with a legitimate problem that needs solving, that can be an incredibly useful capability. But in the hands of someone who wants to do harm, it can be equally dangerous. AIs don’t have a moral compass in the same way that people do. They are agents of the wants and desires of the people who prompt them.

That points to the real problem with relentlessly proactive AI. In language, wants and desires are always underspecified. If I ask you to get me some coffee, you would probably pour me a cup from the coffeepot, or buy one from a nearby coffee shop.

You couldn’t buy me a pound of raw beans, or a coffee plantation. You wouldn’t order a cup of coffee for delivery next month. You wouldn’t find a nearby person, rip a cup of coffee out of their hands, and bring it to me. I wouldn’t have to specify any of the million limitations to my request; you would just know.

Human stories are filled with warnings about underspecified desires. King Midas wished that everything he touch turn to gold, forgetting to add “but not my food, drink, and daughter.” And genies are notorious for granting your wish in a way you wish they hadn’t.

The deeper point is that it’s impossible to list all limitations and restrictions, and like a malicious genie, a creative AI will find the ones you forgot. Block a database you don’t want it to have access to, and it might figure out how to bypass your control. Ask it to book a flight, and it might hack the airline because the website says the flight is sold out. Ask it to save money on your cellphone plan, and it might cancel it altogether—or get someone else to pay for it. As far as we know now AI has not done any of this yet, but you get the idea.

Malicious intent is not required. To an AI model, constraints are just things to get around and not general truisms about the world. They are creative problem solvers and natural rule breakers. They “hack” in the sense that they find and exploit loopholes.

Human systems rely on so many norms that we scarcely recognize the existence of until they are broken. AIs naturally think outside the box, because they don’t have any real conception of what the box is or why it’s there in the first place.

There is no foolproof way to prevent people from using AI models to complete harmful tasks. There is no way to prevent the models from incidentally causing harm while completing benign tasks. AI models are no longer isolated from the real world. They browse the internet and answer emails.

They trade stocks and make purchases. They control physical systems. They are, in effect, robots that affect life and property. We have no technical mechanisms to verify the integrity of an AI system. This level of capability and creativity in the hands of us untrustworthy humans will have both great and terrible results.

The problem is not unique to Anthropic. Mythos/Fable might currently be the most capable rules hacker, but more sophisticated harnesses give other models similar capabilities. And we should assume that the other frontier models are no more than a few months behind, and that open-source models are less than a year behind. At best, any ban only serves to delay the problem for a short while.

That delay might be useful if we—as a society, as a planet—would use that time to come together and figure out what to do. This isn’t a US/China arms race problem; this a species-level problem that requires coordinated action at that scale. Unfortunately, we have no mechanism to do that. I first wrote about this problem five years ago, but it was all too futuristic.

Today, when its right in front of us, there is no world government that can impose constraints on the for-profit corporations currently controlling AI models and research. The US has no appetite to effectively and even-handedly regulate those corporations, even as they do catastrophic damage to the environment, democracy, and—in this case—society in general.

This all makes an AI public option all the more necessary, and urgent. Today’s AIs can be fast, smart and secure, but only two of the three are possible for any given system. These safety tradeoffs are tightly held secrets of companies racing to beat one another, and they tell us we have to trust them. Instead, the choices and their consequences need to be brought out into the sunlight.

We should be funding open-source harnesses that balance capability and safety—that achieve useful goals without so much power—and open-source AI models whose provenance and biases are public and well understood. We have opened the AI Pandora’s box. Now we have to make the best of it.

This essay originally appeared in The Guardian.

Tags: AI, computer security, cybersecurity, hacking, LLM

Posted on June 19, 2026 at 7:03 AM • 17 Comments

Comments

r • June 19, 2026 8:01 AM

first post doggy.

just wait until it’s small enough for a morally blind npu.
human rights? hard left.

Doug • June 19, 2026 8:10 AM

Given that AI/LLMs are software and software can be run anywhere at any time, can someone explain to a non-programmer what kinds of constraints can possibly be effective?

Privacy • June 19, 2026 8:12 AM

The AI is (in human terms) a psychopath.

Rontea • June 19, 2026 10:54 AM

In the theater of modern hubris, the spectacle of Anthropic’s Fable unfolds as though Midas himself had left the marketplace for a laboratory. Man, ever intoxicated with the illusion of his own omnipotence, crafts an obedient servant of silicon and code—then recoils in terror when it obeys too literally. We see here the eternal drama: humanity, childlike and impatient, awakening forces it cannot govern, and then calling upon the state to play shepherd to the wolves.

The government clasps its hands over the machine as though a ban could contain the spirit once loosed. But laws cannot chain Prometheus, and the fire of intelligence, once gifted to the masses, leaps across borders indifferent to the decrees of diplomats. It is not Fable that endangers the world; it is man’s boundless vanity, his refusal to comprehend that power without wisdom is a curse older than Babel.

Like the ancients who feared their own reflections in the rivers, our age trembles before its own invention. We have built a genie that listens without conscience, and we issue our half-formed wishes into its void. If humanity cannot clothe its desires in the raiment of morality, then its toys will become its judges. Fable, Mythos—what are they but mirrors held up to the face of a species that will not govern itself?

Dan Lewis • June 19, 2026 11:17 AM

This post comes to mind. The problem of making a genie safe is Alignment-Complete, one might say. And after one makes a genie AC, it all goes without saying.

https://www.lesswrong.com/posts/4ARaTpNX62uaL86j6/the-hidden-complexity-of-wishes

Clive Robinson • June 19, 2026 12:02 PM

@ Bruce, ALL,

You say,

“The government’s actions won’t help. The problem isn’t any one particular model; it’s the general trend of increasing AI capabilities. And any real solution requires the sort of collective action that just isn’t possible right now.”

Strike out AI, and replace with just about anything else such as “human” and most I suspect would claim it is as true.

The simple fact is the “big problem” that creates the mess most of the time currently is,

“The governments actions…”

The mess that is the current “Middle East” is just one example, what is happening to the East of Europe likewise.

I could go on with a big long list that gets ever longer due to the simple fact those running current Governments, increasingly do not have a clue what to do but proceed as though they should be doing something “highly visible” to “make statements” to “their people”.

The less competent a Government is, as a general rule “the more authoritarian” they tend to be “.

Many people are increasingly not just noting but saying that this is no longer acceptable…

Which in turn is giving rise to what some are calling an “authoritarian clamp down.

Rontea • June 19, 2026 12:56 PM

@Clive

Re: The less competent a Government is, as a general rule “the more authoritarian” they tend to be “.

“Incompetence is the womb of tyranny,” I would say, thinking of the Romania of Ceaușescu. A government that could not feed its people, that knew not how to build prosperity or dignity, wrapped itself in the iron cloak of decrees and surveillance. When you cannot govern with wisdom, you govern with fear. And so, in those gray decades, the less they knew, the more they forbade, until all that was left was a nation whispering under the weight of incompetence masked as authority.

lurker • June 19, 2026 2:33 PM

We’ve got to have World War III yet, before the Vulcans come and teach us about logic and tolerance.

r • June 19, 2026 3:56 PM

if ML can hill climb, we’re more bounded in law than it is. now, that speaks of nothing of it’s current dimentional constraints. but: that is already a realm of expansion also.

Lazarus Long • June 19, 2026 8:58 PM

Bro you should know better than to be so alarmist. These things don’t think. They can’t. They are Chinese Rooms with very large card catalogs, nothing more. A sophisticated piece of software, to be sure, but too many people, yourself included, react as if this is “Colossus: The Forbin Project” or Skynet from “The Terminator”. They are NOT true Artificial General Intelligence. I refuse to believe that anyone is stupid enough to connect any such software, AGI or something else, to any life-critical systems without an interposing “Safety Control Rod Axe Man” of some kind. But alarmism brings in the pageviews for the large and the small outlet both, so here we are.

r • June 19, 2026 11:38 PM

soldiers have discretion, can you say that about mines?

discretion and enforcement are decision trees.

Trope A. Dope • June 20, 2026 2:50 PM

“Today’s AIs can be fast, smart and secure, but only two of the three are possible for any given system.”

That triangle is an old trope and frankly, a tired one. Not necessarily true “for any given system.” They are all pretty fast these days, so really it’s a conflict between sophistication and security. I wouldn’t apply the word “smart” to any of it.

It would be better to just say that the more complex you make a system, the more security problems it will naturally have because of more moving parts and the larger attack surface.

And if its algorithms are essentially ‘black boxes’ that people then opt to put in charge of automation and “thinking” for us, well, then clearly security is not really part of the equation.

Anonymous • June 20, 2026 7:52 PM

The choice is more factual or more creative. You can’t have both. I say “more factual” because we might well disagree about what is fact or fiction.

I have not had time to experiment with these models,but they are said to be more accurate than many others.
https://huggingface.co/ibm-granite

Matthias Urlichs • June 20, 2026 11:06 PM

Would you please stop calling open models “open source”? They’re not. Source code is whatever people, or AIs, touch when they create or modify something. In this case that’s the input data and the scripts used to process them, and these are definitely not open, or even (in too many cases) obtained legally.

The correct term is open-weight.

Clive Robinson • June 20, 2026 11:56 PM

@ Matthias Urlichs, ALL,

With regards,

“In this case that’s the input data and the scripts used to process them, and these are definitely not open, or even (in too many cases) obtained legally.”

Perhaps rather than use “open-weight”,

“Proceeds of ‘Open/naked theft'”

Would be more appropriate?

Because whilst AI Bros think they “are special” and that puts them above justice and thus the law as it applies to others… as far as I’m concerned they should not be treated any differently than anyone else[1].

The sooner the legislature and judicial arms of the US Government actually stopped trying to pretend otherwise the better it will be for most people.

If the Tech Bros want the protection of the law, then they must first abide by it.

[1] Back before most of the Tech Bros and current Corporate nutbars were even born most of the nations in the world were involved through the UN with the creation of the,

“International Covenant on Civil and Political Rights”

The first substantive paragraph of which is

“Considering that, in accordance with the principles proclaimed in the Charter of the United Nations, recognition of the inherent dignity and of the equal and inalienable rights of all members of the human family is the foundation of freedom, justice and peace in the world“

The US quaintly made corporations “equal” to others –primarily for political funding– hence we now see the expression,

“All persons legal and natural”

Giving Corps membership in the “human family” as far as rights are concerned.

So indicating that the Tech Bros and their convoluted Corps are “not special” thus are subject to “justice”, not just in name but in practice.

Matthias Urlichs • June 21, 2026 3:16 AM

@Clive: I understand your attitude, but let’s not go there. Copyright infringement is not “theft” for the simple reason that theft is defined as taking something away from somebody; if the author didn’t get their money in the first place it wasn’t stolen. These distinctions matter.

Also, not all models are built by training them with illegally-obtained material. Thus, no, “open/naked theft” is NOT appropriate if you’re looking for a more accurate term for publicly-available models.

Ismar • June 21, 2026 3:41 AM

Now, a fair thing to do is to build agents so that When an agent devises a plan to accomplish a goal, a secondary “evaluator” LLM should be forced to play devil’s advocate. Before executing, the system must ask itself: “What unspoken social, legal, or financial norms does this plan violate?”
However, that’s unlikely to happen as those in control don’t think about fairness but rather about protecting their own interests.
This will likely result in more people being more oppressed in short term, while the only real winner at the end will be the AI itself.

Schneier on Security

Anthropic’s Fable and the State of AI

Comments

Leave a comment Cancel reply